Deciphering the Structural and Chemical Transformations of Oxide Catalysts during Oxygen Evolution Reaction Using Quick X-ray Absorption Spectroscopy and Machine Learning

Bimetallic transition-metal oxides, such as spinel-like CoxFe3–xO4 materials, are known as attractive catalysts for the oxygen evolution reaction (OER) in alkaline electrolytes. Nonetheless, unveiling the real active species and active states in these catalysts remains a challenge. The coexistence of metal ions in different chemical states and in different chemical environments, including disordered X-ray amorphous phases that all evolve under reaction conditions, hinders the application of common operando techniques. Here, we address this issue by relying on operando quick X-ray absorption fine structure spectroscopy, coupled with unsupervised and supervised machine learning methods. We use principal component analysis to understand the subtle changes in the X-ray absorption near-edge structure spectra and develop an artificial neural network to decipher the extended X-ray absorption fine structure spectra. This allows us to separately track the evolution of tetrahedrally and octahedrally coordinated species and to disentangle the chemical changes and several phase transitions taking place in CoxFe3–xO4 catalysts and on their active surface, related to the conversion of disordered oxides into spinel-like structures, transformation of spinels into active oxyhydroxides, and changes in the degree of spinel inversion in the course of the activation treatment and under OER conditions. By correlating the revealed structural changes with the distinct catalytic activity for a series of CoxFe3–xO4 samples, we elucidate the active species and OER mechanism.


Supplementary Note 1: Principal component analysis of XANES data
The objective of principal component analysis (PCA) is to express the experimental XANES spectra in a dataset as linear combinations of a few linearly independent vectors (principal components) ( ): For convenience, we follow here a common practice in PCA (that, however, is not always observed in PCA-XANES analysis), and subtract from the experimental spectra ( ) the averaged spectrum ̅( ), calculated over the entire dataset. 1,2 One can notice the analogy between PCA and the linear combination analysis (LCA), commonly used for XANES data interpretation. In LCA, an experimentally measured XANES spectrum is expressed as a linear combination of spectra for a few reference compounds. The important difference between LCA and PCA is that in the latter the principal components (PCs) used for linear combinations are abstract vectors that do not correspond to any particular pure species. On the other hand, since PCs are deduced from the experimental dataset itself using linear algebra methods, PCA works even in the cases where the lack of sets of suitable reference spectra prevents the application of LCA. In particular, an important objective of PCA is to determine the number of principal components d that is needed to reproduce all the meaningful spectroscopic variations in the experimental dataset. This number defines the dimensionality of the dataset, and is thus related to the number of spectroscopically distinct species present in the course of the reaction. Note that the actual number of species in the sample may be larger, if the spectra for two or more distinct species are very close, or if the concentrations of two species are proportional to each other at all times during the experiment.
The principal components ( ) and the corresponding projections of the experimental spectra on the PCs (see Eq.(S1)) can be obtained by performing singular value decomposition (SVD) of the matrix formed by all the experimental spectra. 1,3 Here for this purpose we used a combined dataset containing normalized Co K-edge XANES spectra for CoOx, Co2.25Fe0.75O4 and CoFe2O4, shown in Figure 1a in the main text and Figure S1ab. We do not include here the spectra for Co0.25Fe2.75O4 due to the lower signalto-noise ratio for this dilute sample. Nonetheless, once the PCs were obtained (based on the data for CoOx, Co2.25Fe0.75O4 and CoFe2O4), we found that all the spectra for Co0.25Fe2.75O4 can also be accurately expressed as linear combinations of these PCs ( Figure  S2), suggesting that this sample does not contain spectroscopically unique species that are not already present in the datasets for CoOx, Co2.25Fe0.75O4 or CoFe2O4.
We note that if the spectra for the Co0.25Fe2.75O4 sample are included in the construction of PCs, the current third principal component PC-3 becomes the fourth most significant PC (PC-4'), while the new third PC (PC-3') has no pronounced spectroscopic features ( Figure S6a). However, the weight of PC-3' changes unsystematically from one sample and spectrum to another (Figure S6b-e). We thus conclude that the PC-3' corresponds to experimental noise, which in this particular case happens to provide a larger contribution to the experimental data than the contribution of the physically meaningful PC-4'.

Supplementary Note 2: Details of neural network training and validation
In a heterogeneous material containing several different phases, and/or metal sites within the same phase, but with crystallographically non-equivalent environments, the total EXAFS spectrum ( ) can be expressed as a sum of all contributions from all nonequivalent sites ( ) = ∑ ( ). Here, is the concentration of the s-th species (site), and ( ) is the partial spectrum associated with the s-th species. Each ( ) can, in turn, be expressed as a sum of contributions from different photoelectron scattering paths ( ) = ∑ ( ). The summation includes both, single scattering as well as multiple scattering paths. Contributions of single scattering paths, in turn, can be directly linked to the radial distribution functions (RDFs) ( ) as Here and represent the real and imaginary parts of the photoelectron scattering function, R is the interatomic distance between the absorbing metal atom and neighbouring atom of p-th type (e.g., oxygen or another metal), while 0 2 is the amplitude reduction term accounting for the many-electronic effects. Considering that ( ) are not arbitrary functions, but must describe physically reasonable bond length distributions around the s-th species, the equation above defines the one-to-one correspondence between the EXAFS spectrum ( ), and the set of all partial RDFs: { ( )} ↔ ( ). Importantly, the one-to-one correspondence holds only if ( ) for some species s cannot be expressed just by scaling the ′ ( ) function for some other species ′ with some constant multiplier. Such scenario would result in an ambiguous situation, where an infinite number of ′ ( ) and ( ) combinations with different weights and ′ would give the same total ( ) spectrum. In our case, this condition means that the mapping between partial RDFs and EXAFS spectrum is unambiguous only if the analysis is extended to distant coordination shells: the bond length distributions within the first coordination shell are often very similar for different species s, corresponding, e.g., to different oxide phases.
The one-to-one correspondence between EXAFS spectra and the set of RDFs can be inverted by a neural network (NN) approach. Here the relationship between the spectral features and RDFs is established during NN training procedure, where a large set of theoretical EXAFS spectra with known corresponding RDFs is used to optimize NN parameters.
To construct a dataset for NN training, we follow the approach introduced in our previous works. [3][4][5] We first generate pairs of theoretical site-specific EXAFS spectra and corresponding RDFs for all unique metal sites in a set of relevant Co and Fe oxides, hydroxides and oxyhydroxides. Here we consider octahedrally-coordinated Co species in rocksalt-type Co(II) oxide (CoO-rs), CoOH2 and CoOOH, tetrahedrally coordinated Co species in wurtzite type CoO (CoO-w), as well as tetrahedrally and octahedrally coordinated Co sites in Co3O4, octahedrally-coordinated Fe species in FeO, α-Fe2O3, FeOOH, and tetrahedrally-and octahedrally-coordinated Fe sites in spinel-type Fe3O4 and γ-Fe2O3. For each of these well-defined reference compounds we sampled atomic configurations using molecular dynamics (MD) or Monte Carlo (MC) methods with empirical force field models. Whenever available, we used force field models from the literature, including Buckingham-type potential developed by Lewis and Catlow, used here for rocksalt-type FeO and CoO, 6 potential developed by Cooke et al that we used for γ-Fe2O3, 7 potential for α-Fe2O3 from Erlebach at al, 8 and an exponential shell potential for Co3O4 from Hu et al. 9 For the structure models, for which the potentials were not available in the literature, we have found that good results can be obtained by using simple Lennard-Jones type potentials = 4 (( ) 12 − ( ) 6 ), where the σ parameter is chosen to match the equilibrium distance between the nearest neighbors for given reference material, and parameter (potential depth) is optimized to ensure that the EXAFS spectra, calculated with this potential, match as well as possible the available room temperature EXAFS spectra for this reference material. Clearly, this simple approach cannot yield force field model that is able to reproduce all physical properties of the material. Nonetheless, it is sufficient for our purposes, since it allows us to generate realistic-looking EXAFS spectra and corresponding RDFs that are needed for NN training. MD and MC simulations were carried out using Gulp code. 10 MD and MC simulations have been carried out at different temperatures, to account for different degrees of structural disorder. The obtained atomic coordinates in the MD and MC models were further isotropically rescaled in the range between 0.95% and 1.05%, to account for possible variations in the lattice constants. For each of the resulting ca. 20000 structure models we calculated the corresponding partial metal-oxygen and metal-metal RDFs.
Theoretical EXAFS spectra for structures sampled by MD and MC methods, were calculated using EvAX code, 11 which itself uses FEFF-8 code 12 for ab initio calculations of EXAFS spectra. Pairs of the calculated site-specific EXAFS spectra ( ) and partial RDFs sets ⃗ were then constructed, where ⃗ is a vector consisting of four concatenated partial The generated site-specific spectra and corresponding site-specific RDFs corresponding to unique sites in pure compounds were mixed linearly, to obtain 10000 pairs of spectra and RDFs for mixtures. For this purpose we randomly selected three { ( ), ⃗ }, and constructed their linear combination with random weights wji, giving us spectrum for a mixture of species ̃( ) = ∑ ( ) and the corresponding concatenated RDF vector ̃( ) = ∑ ⃗ . We repeated this process 10000 times.
One should note here that our approach will be able to interpret EXAFS spectra correctly in the case when they correspond to the structures that are not too different from those considered when constructing the training data set. In the present case these are, for instance, mixtures of structural motifs encountered in common Co and Fe oxide materials. Some hypothetical amorphous structures with unique bonding motifs can thus be missed. Unfortunately this limitation is unavoidable. In fact, the extraction of RDFs from EXAFS spectra is, in general, an ill-defined problem, if no additional assumptions are made about which solutions might be feasible. This is because many different RDFs can produce similar EXAFS spectra if no restrictions on the shapes of the RDFs are imposed. As a result, one needs to define, which solutions to consider. In our approach this is done automatically based on the chosen training data set, limiting thus the applicability of the method to a certain class of materials.
NN is then constructed as a mathematical function ( ⃗ , ) → that depends on a large set of parameters ⃗ and maps the EXAFS spectrum , provided as its input, to some output vector . During the NN training, we provide the spectra ̃( ) from the training set as inputs for the function H, and tune the weights ⃗ , so that the NN outputs match as well as possible the known corresponding ̃ vectors. After the training is completed, the weights ⃗ are fixed, and the NN can take as input experimental EXAFS spectra, providing as outputs the corresponding sets of partial RDFs.
NN for EXAFS data interpretation was constructed and trained similarly as in our previous works. 4,5 Our neural network consists of an input layer, several hidden layers, and the output layer. The nodes in the input layer are initialized by the analyzed EXAFS data. The output layer yields the vector describing concatenated partial RDFs. The nodes in the hidden layers increase the flexibility of the mathematical model, represented by NN, and allow it to map the strongly non-linear relationships between input and output vectors.
In our previous works, before being provided to NN as an input, an EXAFS spectrum was pre-processed using Morlet wavelet transform. 4,13 In this work, we used instead the short time Fourier transform (STFT), which can be considered as a special case of Morlet wavelet transform, where the width of wavelet function is fixed and not scaled with frequency. The STFT was carried out in a k-range between 3 and 9 Å -1 . We have checked the effect of the used k-range on the stability and accuracy of NN predictions, and have found no significant improvements, if the k-range was extended to 12 Å -1 . For the NN input, we used selected STFT coefficients (both real and imaginary parts) that exhibited the largest sensitivity to the details of local structure. In total, the NN input layer contained 2165 nodes.
The output layer of our NN represents four concatenated RDFs To estimate the uncertainties, we constructed and trained independently 10 different NNs. The NN-EXAFS results reported here and in the main text are average predictions of these 10 NNs, while the standard deviations of NN predictions are used as estimators of statistical uncertainty.
To validate the NNs, and to assess the systematic errors, we test the NN-EXAFS method on a set of reference spectra for standard materials, where the RDFs can be obtained independently by using reverse Monte Carlo (RMC) simulations. 11,15 RMC is an EXAFS fitting approach, where the atomic coordinates within 3D structure model of the material are iteratively optimized, until a good agreement is achieved between experimental EXAFS spectrum, and corresponding theoretical EXAFS spectrum, calculated for the structure model. As shown in Figure S12, RMC allows one to obtain structure models in an excellent agreement with experimental EXAFS data, accounting also for the contributions of multiple scattering effects, distant coordination shells and non-Gaussian shapes of bond length distributions. As demonstrated in Figure S10 and Figure 4 in the main text, the RDFs, yielded by NN-EXAFS method, agree well with the RDFs from RMC-EXAFS simulations.

Supplementary Note 3: Effect of the Fe K-edge EXAFS on the Co K-edge EXAFS
When discussing EXAFS spectra in materials featuring elements that are neighbors in the Periodic Table, especially if the concentration of the lighter element (Fe in our case) is much higher that that of the heavier element (Co in our case), a "leaking" of EXAFS oscillations of the lighter element into the EXAFS spectrum of the heavier element can sometimes be observed. 16 This could result in the distortions of the Co K-edge EXAFS spectra for Co-poor materials.
To check the importance of this effect in our samples, and to test, whether the Fe K-edge EXAFS oscillations affect the shape of Co K-edge EXAFS, we constructed an artificial X-ray absorption spectrum, where we added the normalized reference spectra for γ-Fe2O3 and Co3O4, collected at Fe K-edge and Co K-edge, respectively. As discussed in the main text, these reference materials represent well the local structure in our catalysts. The weights of Fe and Co spectra in these synthetic spectra were varied to model the effect of different Fe to Co ratios in our samples. From these synthetic spectra we then extracted Co K-edge EXAFS, and compared it with the true Co K-edge EXAFS spectrum for Co3O4 ( Figure S16). As one can see, even for the highest Fe to Co ratio (11:1, corresponding to the Fe to Co ratio in our Co0.25Fe2.75O4 sample), the leaking of Fe Kedge EXAFS into Co K-edge EXAFS results only in very minor distortions of Co K-edge EXAFS features. This effect also certainly cannot affect our main conclusion for the Co0.25Fe2.75O4 sample, namely. For samples with lower Fe to Co ratios, the leaking of Fe K-edge EXAFS into Co K-edge EXAFS is completely negligible. Figure S1. Evolution            Green lines indicate the partial RDFs calculated for tetrahedrally coordinated and octahedrally coordinated metal sites. Red lines correspond to the total metal-oxygen RDFs, with the NN uncertainty indicated as red shaded areas. The orange lines correspond to the partial RDFs calculated for tetrahedrally coordinated and octahedrally coordinated metal sites Figure S12. Validation of the NN-EXAFS method using EXAFS spectra for reference materials. The fractions of tetrahedrally coordinated metal sites, as obtained from the NN-EXAFS analysis of Fe K-edge (blue circles), Co K-edge (red circles) and Mn K-edge and Zn K-edge (green circles) reference materials. Black circles show results obtained for synthetic spectra, constructed as linear combinations of Co K-edge EXAFS spectra for wurtzite-type CoO (tetrahedrally coordinated Co), and CoOOH (octahedrally coordinated Co).   Co0.25Fe2.75O4 catalysts (d). Spectra are shifted vertically for clarity. Insets compare representative spectra for as-prepared samples, samples under OER conditions and for samples after OER. Fourier transform was carried out in the k-range between 1 and 12 Å -1 in the panels (a,b, c), while for Co-poor sample in panel (d) shorter k-range was used (between 1 and 9 Å -1 ) due to the lower signal-to-noise ratio. Each depicted spectrum is obtained after averaging the QXAFS data collected within 50 s (for panels (a,b, c)), and within 200 s for spectra in panel (d). Figure S16. (a) Synthetic XAS spectrum, constructed by adding normalized reference spectra of γ-Fe2O3 and Co3O4, collected at Fe K-edge and Co K-edge respectively. The weights of Fe K-edge and Co K-edge spectrum are 11 to 1, mimicking the Fe to Co ratio in Co0.25Fe2.75O4 sample. (b) Comparison of the Fourier transformed Co K-edge EXAFS spectra for Co3O4 with that for the artificial spectrum shown in (a), as well as for the artificial XAS spectrum constructed by adding spectra of γ-Fe2O3 and Co3O4 with the ratio 1 to 1. Figure S17. Evolution of Fe K-edge EXAFS spectra during the activation and under OER conditions at 1.8VRHE for Co2.25Fe0.75O4 (a), CoFe2O4 (b) and Co0.25Fe2.75O4 (c) catalysts. Spectra are shifted vertically for clarity. Insets compare representative spectra for the as-prepared samples, samples under OER conditions and for samples after OER. Each depicted spectrum is obtained after averaging the QXAFS data collected within 50 s (for panels (b, c)), and within 200 s for spectra in panel (a). Figure S18. Evolution of Fourier-transformed Fe K-edge EXAFS spectra during the activation and under OER conditions at 1.8VRHE for Co2.25Fe0.75O4 (a), CoFe2O4 (b) and Co0.25Fe2.75O4 (c) catalysts. Spectra are shifted vertically for clarity. Insets compare the representative spectra for the as-prepared samples, samples under OER conditions and for samples after OER. Fourier transform was carried out in the k-range between 1 and 12 Å -1 . Each depicted spectrum is obtained after averaging the QXAFS data collected within 50 s (for panels (b, c)), and within 200 s for spectra in panel (a). Figure S19. Comparison of Fourier-transformed Fe K-edge EXAFS spectra for all CoxFeyOz samples with different Co to Fe ratio. Spectra collected for the as-prepared samples under OCP, and for samples under OER conditions at 1.8VRHE are shown, as well as the spectra for reference spinel-like oxides γ-Fe2O3 and Fe3O4. Fourier transform was carried out in the krange between 1 and 12 Å -1 . Spectra are shifted vertically for clarity.